Configure Agent #524

kate-osborn · 2023-04-04T21:51:12Z

This is a draft PR for pushing nginx configuration to connected agents.

This is not ready for a full review, some tests are broken, and others are missing altogether. There are still some open questions that are tagged as TODOs and others that I will call out in the comments. I would appreciate any feedback/opinions on these as you see them.

I'm looking for feedback on the overall design. The agent configuration is implemented using the Observer design pattern, specifically, the pull variation of Observer. The pull variation forces the ConfigStore to be thread-safe, as there will be many readers (agent connections) and one writer. This variation simplifies the update logic for agent connections because they pull the latest config whenever they are done waiting for a previous config apply to complete. However, I'm worried about lock contention. If we have many agents, will we create a bottleneck in the ConfigStore, with each agent trying to get the latest config? We could address this by using the push variation of the Observer pattern. In this variation, the ConfigStore would send the latest config on every Notify() call to all its observers (agent connections). This would move the locking from the ConfigStore to the individual agent connections. Each connection would save the latest config and queue a config update. This would require a lock for the config (could use atomic.Value) to prevent a race condition between a connection reader and writer. This variation could make the connection logic more complex but could be better overall.

… conf

…rs to main conf

deploy/manifests/nginx-gateway.yaml

github-advanced-security · 2023-04-04T21:55:28Z

You have successfully added a new Trivy configuration .github/workflows/ci.yml:build. As part of the setup process, we have scanned this repository and found no existing alerts. In the future, you will see all code scanning alerts on the repository Security tab.

examples/many-updates/1-cafe-routes.yaml

kate-osborn · 2023-04-04T21:56:26Z

internal/events/handler.go

 	if err != nil {
-		return err
+		h.cfg.Logger.Error(err, "error storing dataplane configuration")
+		// FIXME(kate-osborn): Update status to indicate that the gateway is not accepted or programmed.


@pleshakov how do you feel about making this task a separate story to reduce the scope of this PR?

this makes sense to me!

internal/grpc/commander/commander.go

internal/grpc/commander/connection.go

kate-osborn · 2023-04-04T22:04:55Z

internal/grpc/commander/connection.go

-	c.logger.Info("Received agent connect request", "message ID", cmd.GetMeta().GetMessageId())
+	c.logger.Info("Received agent connect request")
+
+	c.logger = c.logger.WithValues("podName", req.GetMeta().DisplayName)


@pleshakov can you think of any reason adding a value to the logger during runtime would cause a problem? I think this is ok, but I might be missing something.

this looks like a data race (multiple go routings has access to c.logger and at least one of them is writing)

perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?

perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?

Intriguing idea. I'll play around with it.

internal/grpc/commander/doc.go

kate-osborn · 2023-04-04T22:06:08Z

internal/nginx/agent/config_builder.go

+)
+
+const (
+	// TODO: do we need another file mode for config files?


@pleshakov thoughts?

do we expect manual (not from the agent changes to files)? if not, I think we should all (conf and secrets) of them as 0o400

No, we don't expect manual changes. I'll change to 400

internal/nginx/agent/config_store.go

internal/nginx/config/generator.go

internal/nginx/config/servers_template.go

internal/state/dataplane/configuration.go

kate-osborn · 2023-04-04T22:15:28Z

internal/state/secrets/secrets.go

 	// WriteAllRequestedSecrets writes all requested secrets to disk.
 	WriteAllRequestedSecrets() error
+	// GetAllRequestedSecrets returns all request secrets as Files.
+	GetAllRequestedSecrets() []File


@pleshakov

I left the WriteAllRequestedSecrets method because I'm toying with the idea of implementing the ConfigStorer, Subject, and Observer interfaces for configuring an nginx instance in the same Pod. That would allow us to benchmark the two approaches. What do you think?

Either way, I still need to rename this interface, or separate the Get functionality from the Write.

That would allow us to benchmark the two approaches.

what kind of benchmarks do you have in mind?

Not sure yet... it would probably be an e2e test where we benchmark the time it takes to update the config in the dataplane.

pleshakov

However, I'm worried about lock contention. If we have many agents, will we create a bottleneck in the ConfigStore, with each agent trying to get the latest config?

is there a way to quickly test it?

We could address this by using the push variation of the Observer pattern. In this variation, the ConfigStore would send the latest config on every Notify() call to all its observers (agent connections). This would move the locking from the ConfigStore to the individual agent connections. Each connection would save the latest config and queue a config update. This would require a lock for the config (could use atomic.Value) to prevent a race condition between a connection reader and writer. This variation could make the connection logic more complex but could be better overall.

~~I believe the current approach has a locking problem (mentioned it in an inline comment), so additional locking logic is required either way.~~

Perhaps another approach here could be for agents to create a one-time watch. Something like below (note: canclelation is omitted for simplicity):

	var latestCfgID int
	
	for {
		cfg <- configStore.GetConfigWithGreaterID(latestCfgID)
		
		push(cfg)
		
		latestCfgID = cfg.ID
	}

That will also remove the need for Register and Remove methods.

pleshakov · 2023-04-05T00:19:51Z

internal/events/handler.go

 	if err != nil {
-		return err
+		h.cfg.Logger.Error(err, "error storing dataplane configuration")
+		// FIXME(kate-osborn): Update status to indicate that the gateway is not accepted or programmed.


this makes sense to me!

examples/many-updates/1-cafe-routes.yaml

internal/grpc/commander/commander.go

pleshakov · 2023-04-05T18:36:35Z

internal/grpc/commander/connection.go

-	c.logger.Info("Received agent connect request", "message ID", cmd.GetMeta().GetMessageId())
+	c.logger.Info("Received agent connect request")
+
+	c.logger = c.logger.WithValues("podName", req.GetMeta().DisplayName)


this looks like a data race (multiple go routings has access to c.logger and at least one of them is writing)

perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?

pleshakov · 2023-04-05T18:48:19Z

internal/nginx/agent/config_builder.go

+)
+
+const (
+	// TODO: do we need another file mode for config files?


do we expect manual (not from the agent changes to files)? if not, I think we should all (conf and secrets) of them as 0o400

internal/grpc/commander/connection.go

pleshakov · 2023-04-05T21:04:26Z

internal/grpc/commander/connection.go

+	c.sendConfigApplyResponse(ctx, car)
+}
+
+func (c *connection) sendConfigApplyResponse(ctx context.Context, response configApplyResponse) {


"send" is a bit misleading here, because we're not sending anything to the agent

pleshakov · 2023-04-05T21:05:50Z

internal/nginx/agent/config_builder.go

+	}
+}
+
+func (u *NginxConfigBuilder) generateZippedFile(dir directory) (*proto.ZippedFile, error) {


we're not using the receiver u at all inside the function, right?

pleshakov · 2023-04-05T21:08:30Z

internal/nginx/agent/config_store.go

+// ConfigStore implements the observer.Subject interface,
+// so that it can notify the agent observers when the configuration changes.
+// ConfigStore is thread-safe.
+type ConfigStore struct {


would it make sense for this type to be more generalized? so that it doesn't deal with the NGK specific configuration. This way, it can potentially be reused in other projects, to configure a set of data plane replicas with the same configuration.

How would that work? Would we just make this generic and have it store and return the any type?

I'm worried about extracting the building piece of this. I only want to build the nginx agent config once because it involves costly operations like checksums.

I guess I could perform the dataplane.Configuration -> agent.NginxConfiguration before storing the config and notifying the observers.

I guess I could perform the dataplane.Configuration -> agent.NginxConfiguration before storing the config and notifying the observers.

yep

pleshakov · 2023-04-05T21:08:51Z

internal/nginx/agent/config_store.go

+}
+
+// Notify notifies all registered observers.
+func (a *ConfigStore) Notify() {


does this method need to be public?

nope, my bad

kate-osborn · 2023-04-28T22:39:06Z

@pleshakov this is ready for another review -- still a draft.

Since it's a significant refactor and it's been a while since your last review, would it be easier to open a fresh PR?

pleshakov · 2023-05-03T20:00:34Z

@kate-osborn

Since it's a significant refactor and it's been a while since your last review, would it be easier to open a fresh PR?

that makes sense

kate-osborn · 2023-05-03T20:19:05Z

Will reopen

Kate Osborn added 11 commits April 4, 2023 15:23

Update agent version

5a88636

Update manifests and nginx-agent conf

44266cb

Agent config store, builder, secret files, observer/subject, gen main…

b657659

… conf

Add config gen

9658b4e

Remove gotemplate alias, add explicit listen 80, and move error serve…

864566b

…rs to main conf

Remove agent interface, agent pool, and agent manager

736a3dd

Track connections on commander; implement download and upload

897a27d

Implement config pull and send

3a126da

Add test example files

0b74ef9

Lint fixes

67a8bc6

Fix some errors

1be91eb

github-actions bot added dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request labels Apr 4, 2023